-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: copy_behaviors to make sub-classing easy #3137
Conversation
0dc14c1
to
a60f913
Compare
Any update on the tests for this PR? |
Hi, this already has tests! I think I forgot to request a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a60f913
to
3113246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited this PR to remove the continue
. (There was a continue
in the previous function, unique_list
, because it was following a memoization pattern that should fail early with minimal fanfare if an item is in the memo. The copy_behaviors
function is different: not isinstance(key, str) and "*" not in key
is part of the basic logic.)
Along the way, I noticed something more dangerous: copy_behaviors
modifies the behavior
in place, and if there's an error part-way through, it will leave the behavior
in an intermediate state, which is likely invalid. At least it should not apply its changes until there's no chance of failure. But, better still, it can return new behaviors for the caller to update
, rather than change them in place. This is a different API:
ak.behavior.update(ak._util.copy_behaviors(existing_class, new_class, ak.behavior)
The fact that the ak.behavior
has to be mentioned twice is a feature: maybe the user wants the new behaviors in some different dict.
I'm still confused about why these two cases are excluded:
key
is a single string, such asexisting_class.__name__
key
includes a"*"
, which indicates that a class may be inside of nested lists.
As it is, copy_behaviors
would copy everything except the ak.Record
subclass (e.g. SomeObject
with methods) and its vectorized verions (e.g. ArrayOfSomeObject
with vectorized methods). These will later be replaced by overloads, but so will some of the NumPy ufunc or Numba typing/lowering overloads—I don't know why these are singled out. But what copy_behaviors
should do is determined by its use, and I haven't seen how it's intended to be used.
I see that you fixed the rapidjson directory, so this is ready to merge, if my confusion above is unfounded. Otherwise, you can remove the check for "*" not in key
and add a special case for key == existing_class.__name__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm merging this first version of copy_behaviors
now, but as I said above, I think it should do a different thing (not skip the keys it's skipping). That can come later.
Hi @jpivarski, thanks for the review and fixing the PR! Sorry for taking too long to respond here. The MWE below shows why I added the import awkward as ak
import numpy
class SuperVector:
def add(self, other):
return ak.zip(
{"x": self.x + other.x, "y": self.y + other.y},
with_name="VectorTwoD",
behavior=self.behavior,
)
@ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
print(ak.behavior)
@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
pass
print(ak.behavior)
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>
# 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>
ak.behavior.update(
ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior)
)
print(ak.behavior)
# ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDArray'> -> removing the star condition (the class is wrong)
# ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'> -> removing the string condition Additionally, replacing the generic string condition with a more specific case - if oldname != key: leads to - {'VectorTwoD': <class '__main__.VectorTwoDRecord'>, ('*', 'VectorTwoD'): <class '__main__.VectorTwoDArray'>, (<ufunc 'add'>, 'VectorTwoD', 'VectorTwoD'): <function <lambda> at 0x1050a7ba0>, 'VectorTwoDAgain': <class '__main__.VectorTwoDAgainRecord'>, ('*', 'VectorTwoDAgain'): <class '__main__.VectorTwoDAgainArray'>, (<ufunc 'add'>, 'VectorTwoDAgain', 'VectorTwoDAgain'): <function <lambda> at 0x1050a7ba0>, ('V', 'e', 'c', 't', 'o', 'r', 'T', 'w', 'o', 'D', 'A', 'g', 'a', 'i', 'n'): <class '__main__.VectorTwoDAgainRecord'>} Could you please let me know what should be done here? I tested the current implementation with Coffea and everything works as expected. (I pointed out that there is a bug in |
You definitely want to check for the I was assuming that you'd copy all of the behaviors, including the def copy_behaviors(existing_class: typing.Any, new_class: typing.Any, behavior: dict):
output = {}
oldname = existing_class.__name__
newname = new_class.__name__
for key, value in behavior.items():
if isinstance(key, str):
if key == oldname:
output[newname] = value
else:
if oldname in key:
new_tuple = tuple(newname if k == oldname else k for k in key)
output[new_tuple] = value
return output and then define the new class after having copied the behaviors: class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors(VectorTwoD, VectorTwoDAgain, ak.behavior))
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2) But now I see the problem. You were using the This is going to be a problem because the The This should be enough: def copy_behaviors(from_name: str, to_name: str, behavior: dict):
output = {}
for key, value in behavior.items():
if isinstance(key, str):
if key == from_name:
output[to_name] = value
else:
if from_name in key:
new_tuple = tuple(to_name if k == from_name else k for k in key)
output[new_tuple] = value
return output which would then be used like this: class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior["VectorTwoD"] = VectorTwoD
ak.behavior["*", "VectorTwoD"] = VectorTwoD
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior["VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior["*", "VectorTwoDAgain"] = VectorTwoDAgain
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2) or like this: @ak.mixin_class(ak.behavior)
class VectorTwoD(SuperVector):
def __eq__(self, other):
return ak.all(self.x == other.x) and ak.all(self.y == other.y)
ak.behavior[numpy.add, "VectorTwoD", "VectorTwoD"] = lambda v1, v2: v1.add(v2)
ak.behavior.update(ak._util.copy_behaviors("VectorTwoD", "VectorTwoDAgain", ak.behavior))
@ak.mixin_class(ak.behavior)
class VectorTwoDAgain(VectorTwoD):
pass
ak.behavior[numpy.add, "VectorTwoDAgain", "VectorTwoDAgain"] = lambda v1, v2: v1.add(v2) The general strategy, for a developer who is using the |
Thank you for the detailed explanation! I went through it and have created a PR with the fix. |
How should I interpret the fact this is in the |
|
XRef #2433